-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Configure VPA for reconciler, if enabled #691
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
aff42e9
to
f724628
Compare
75cc4c5
to
07e3b89
Compare
788c243
to
e005fb5
Compare
279e471
to
30179cc
Compare
8390ecb
to
963434f
Compare
} | ||
} | ||
return nil | ||
} | ||
|
||
func (r *reconcilerBase) validateAnnotations(_ context.Context, rs client.Object) error { | ||
autoscalingStrategy := reconcilerAutoscalingStrategy(rs) | ||
if autoscalingStrategy != metadata.ReconcilerAutoscalingStrategyAuto && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably would be cleaner with a switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's better, but i changed it. lmk.
case metadata.ReconcilerAutoscalingStrategyDisabled: | ||
// delete if VPA is installed | ||
if !vpaEnabled { | ||
r.logger(ctx).Info("Managed object delete skipped - not enabled", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: doesn't seem like a skip, given there's nothing to delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the log here.
Name: reconcilerRef.Name, | ||
} | ||
var updateMode autoscalingv1.UpdateMode | ||
if autoscale { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: I feel like using the boolean here doesn't help much with readability. I feel it would be clearer just comparing directly to the declared consts (e.g. with a switch
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Boolean is different than just strategy == Auto here. The Boolean means Auto + APIService exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this is a different place than I thought you were commenting on. Fixed it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the metrics-server
component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPA requires the metrics-server. The metrics-server watches resource usage and exposes them as custom metrics.
38f0141
to
66ac195
Compare
278ac82
to
b0bddf2
Compare
- Configure VPA for reconciler deployment using annotation on RootSync/RepoSync: `configsync.gke.io/reconciler-autoscaling-strategy: Auto` - Auto - evict and recreate pods to apply recommended resource values, as needed. - Recommend - monitor and record recommended resource values for each reconciler, but don't automatically apply them. - Disabled - Do not apply any VPA config, and delete it if it exists with the same name as the reconciler. - VPA disabled by default (opt-in for preview and testing) - When VPA is enabled, set smaller resource requests/limits for smaller footprint on initial install. Adding limits helps hasten VPA adjustments by causing OOMKills, instead of waiting for the VPA to evict the pod. - Move regular (non-VPA) defaults out of a ConfigMap and into the reconciler-manager code, next to the new VPA resource defaults. This should make them easier to keep in sync. - test: Install VPA on kind when --vpa is specified - test: Enable the VPA addon in GKE when creating clusters when --vpa is specified - test: Rewrite some e2e tests to handle resource defaults - test: Log reconciler pod resources on test failure to help debug VPA.
Hopefully this leads to fewer oomkills
Wait for maagement conflict metric after manual change, with the current sync labels.
/hold This PR is on hold until the metrics-server is fixed to be highly available. When the metrics-server in unhealthy, config sync's api discovery breaks, which makes tests fail. |
curious if this is still WIP or if it's even relevant. If it is consider closing the PR and keeping the private branch or converting to draft. |
RootSync/RepoSync:
configsync.gke.io/reconciler-autoscaling-strategy: Auto
values, as needed.
each reconciler, but don't automatically apply them.
exists with the same name as the reconciler.
smaller footprint on initial install. Adding limits helps
hasten VPA adjustments by causing OOMKills, instead of waiting
for the VPA to evict the pod.
reconciler-manager code, next to the new VPA resource defaults.
This should make them easier to keep in sync.
--vpa is specified
VPA.
Design: go/config-sync-reconciler-autoscaling
Bug: b/289388701
Depends On:
Notes: